Transaction-Safe Fat File System Improvements

ABSTRACT

Concepts for enhancing operation of transaction-safe file allocation table systems are described. The concepts include writing a file to non-volatile memory media and rendering an update of file size to the TFAT storage medium; and receiving a request to locate data in a non-volatile memory having a TFAT file management system, selecting a sector of the memory to parse to locate the data, determining when the selected sector is a first sector of a directory or subdirectory of the memory and when determining reveals that the selected sector is a first sector, skipping reading data from the selected sector. The concepts also include flushing a cache and synchronizing FATs.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.13/235,901, filed Sep. 19, 2011, which is a divisional of U.S. patentapplication Ser. No. 12/057,023, filed on Mar. 27, 2008 and issued asU.S. Pat. No. 8,024,507 on Sep. 20, 2011, entitled “Transaction-Safe FATFile System Improvements,” which in turn is a divisional of U.S. patentapplication Ser. No. 10/876,425, filed on Jun. 25, 2004 and issued asU.S. Pat. No. 7,363,540 on Apr. 22, 2008, entitled “Transaction-Safe FATFile System Improvements,” which in turn is a continuation-in-part ofU.S. patent application Ser. No. 10/431,009, filed on May 7, 2003 andissued as U.S. Pat. No. 7,174,420 on Feb. 6, 2007, entitled“Transaction-Safe FAT Files System,” which claims the benefit of U.S.Provisional Application No. 60/420,541, filed on Oct. 22, 2002, entitled“Transaction-Safe FAT Files Subsystem,” listing Michael D. Malueg, HangLi, Yadhu N. Gopalan, Ronald O. Radko, Daniel J. Polivy, Sharon Drasnin,Jason R. Farmer, and DaiQian Huang as inventors. The above-identifiedapplications are hereby incorporated by reference in their entirety.

TECHNICAL FIELD

This disclosure relates to Transaction-safe File Allocation Table (TFAT)file systems designed to reduce the probability that a computer filesystem becomes corrupted in the event of power loss during a writecycle, and, more particularly, to TFAT file systems capable of restoringsystem settings in the event of a power-on-reset (POR) event.

BACKGROUND

Computer systems employ multiple memory types, including ROM, volatilerapid access memories and non-volatile memories. ROM may be used toimplement a basic input output system (a.k.a. BIOS) by having a power onreset circuit that causes the information stored in the ROM to be readand employed by a processor when the power is reset to the computersystem. This is an example of a non-volatile memory, or a memory thatretains stored data even when no electrical power is being supplied tothe computer system.

Volatile rapid access memories, such as cache memories and dynamicrandom access memories (DRAMs), are used to store information elementssuch as data and instructions, and especially those information elementsthat are repeatedly needed by the processor. Volatile memories areincapable of storing data for any significant period of time in theabsence of externally-supplied electrical power.

Computer systems typically include multiple non-volatile memory devices,which have evolved from punch card decks and paper tape systems, throughlarge magnetic disc systems to include compact disc memories, floppydiscs, small, high capacity disc systems, flash memory systems and otherforms of non-volatile data storage devices.

Disc drive data storage systems are typically much slower than manyother types of memory but provide high data storage capacity in arelatively attractive form factor and at a relatively low cost perstored bit. These types of memories include electromechanicalcomponents, and, accordingly, are limited in speed of operation. As aresult, the probability that a power interruption may occur when dataare being written to the memory device is increased, relative to someother types of memory. In order to be able to determine which data werewritten to the disc, and to be able to determine where on the disc thestored data are located, a file allocation table (FAT) system isemployed. Several different kinds of FATs have been developed, includingFAT12, 16 and 32, to address needs of different systems.

In a conventional FAT file system, when a file is modified, new data orchanges to an existing file are written over and/or appended to aprevious version of the file. Additionally, a log file is created ofoperations that will involve writing data to the non-volatile datastorage device. Following writing of the new data or changes, the FAT isupdated and the log is erased. Such FAT file systems track completedtransactions, and are called “transactioned” file systems.

The conventional FAT file system is vulnerable to corruption from a“torn write”, e.g., a write operation that is interrupted such as by anintervening power loss, or when storage media are disconnected during awrite, because of the procedure used to store data. Should power failafter initiation of a write of new data to a file, but before or duringthe corresponding FAT write operation, the entire file system can bedamaged or destroyed. While the likelihood of complete file system lossis small, there is a large probability of lost cluster chains that willrequire some form of servicing by a utility such as scandisk or chkdsk.

FAT file systems by design are not transaction-safe file systems. TheFAT file system can be corrupted when a write operation is interruptedduring a write transaction (a “torn write”) due to power loss or removalof the storage medium. The FAT is corrupted when the contents of the FATdo not agree with the contents of the directory or data sections of thevolume. When this happens, the user will lose some data.

Even when transaction-safe capabilities are included by, for example,use of multiple FATs, together with tracking to ensure that the lastgood FAT and last good data are identified or identifiable, the systemtypically reverts to factory or default settings. In other words, theuser selections for configuration settings, network settings, emailsettings and the like may be replaced with default settings and thusneed to be reset by the user after a POR event.

This is not desirable in certain computer systems, such as thoseembedded computer systems where the data integrity is a high priorityrequirement. In order to reduce these data corruption issues, a new FATsolution is needed for such computer systems that also allows existingsystems to access the storage medium and that is compatible withexisting systems.

SUMMARY

In one aspect, the present disclosure describes a process formaintaining multiple transaction-safe file allocation tables (TFATs) fora volume of TFAT storage medium. The process includes acts of writing afile to non-volatile memory media and rendering an update of file sizeto the TFAT storage medium.

In another aspect, the present disclosure describes one or more computerreadable media having stored thereon a plurality of instructions that,when executed by one or more processors, causes the one or moreprocessors to modify data represented by at least a first sector on astorage medium such that the one or more processors perform actsincluding receiving a request to locate data in a non-volatile memoryhaving a TFAT file management system and selecting a sector of thememory to parse to locate the data. The instructions are also configuredto cause the one or more processors to perform acts of determining whenthe selected sector is a first sector of a directory or subdirectory ofthe memory, and, when determining reveals that the selected sector is afirst sector, skipping reading data from the selected sector.

In a further aspect, the present disclosure describes a process formaintaining transaction-safe file allocation tables (TFATs) for a volumeof TFAT storage medium. The process includes acts of determining when awrite request includes need for writing new data over at least a portionof an entire cluster; and, when determining indicates that the entirecluster will be rewritten, the process further includes acts of locatinga new cluster location and writing the new data over the new clusterwithout first re-writing old data in the new cluster location.

In an additional aspect, the present disclosure contemplates computerreadable media including computer-readable instructions configured tocause one or more processors to open a file for writing in awrite-through mode, first write a first page of data to a first locationin the file within a TFAT volume and, second write a second page of datato a second location in the file within the TFAT volume, wherein thefirst and second write comprise an atomic write.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is a block diagram of an exemplary embedded computer systemincluding non-volatile memory.

FIG. 1B is a block diagram representing an exemplary operating systemand FAT file system suitable for use with the computer of FIG. 1A.

FIG. 2 is a block diagram representing an exemplary transaction-safefile allocation table (TFAT) file system implemented together with avolume of the non-volatile memory of FIG. 1A.

FIG. 3A is a flowchart of an exemplary process for creating directoriesand subdirectories that finds application with the TFAT file system ofFIG. 2.

FIG. 3B is a flowchart of an exemplary process for parsing directoriesand subdirectories that finds application with the TFAT file system ofFIG. 2.

FIG. 4 is a flowchart of an exemplary process for writing data to thenon-volatile memory of FIG. 1A that includes the TFAT file system ofFIG. 2.

FIG. 5 is a flowchart of an exemplary process for synchronizing TFATvolumes in the TFAT file system of FIG. 2.

FIG. 6 is a flowchart of an exemplary process for identification of TFATvolumes and to determine which TFAT is the last known good FAT when avolume of non-volatile memory is mounted a system such as the computersystem of FIG. 1A.

FIGS. 7A and 7B are flowcharts of exemplary processes for writing datato non-volatile storage media using the TFAT file system of FIG. 2.

FIGS. 8A and 8B are block diagrams showing relationships between sectorsforming an exemplary FAT chain for a given file, before and after awrite operation.

FIG. 9 is a flowchart of a process for making file size transactionsafe.

FIG. 10 is a flowchart of an exemplary process for writing a new clusterin the TFAT file system of FIG. 2, whereby a conventional TFAT copycluster is not done when new data will replace/overwrite the existingdata.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

FIG. 1A is a block diagram of a representative computer system 100. Inone embodiment, the computer system 100 is embedded within an applianceor vehicle (not illustrated) and facilitates control of varioussubsystems, coordination between subsystems, data and usage logging andalso facilitates interfacing with external computer devices (not shown).The computer system 100 includes a processor 110, a cache memory 112, abus 120 coupled to the processor 110 and a memory system 130 coupled tothe bus 120. The memory system 130 typically includes a memorymanagement unit 132 coupled to the bus 120 and to ROM 134, temporarystorage memory 138 such as DRAM or SRAM and non-volatile memory 138.

The cache memory 112 typically employs a limited amount of high-speedmemory to facilitate rapid operation of the process 110. For example,the cache memory may store frequently-accessed information and/orinstructions, or may provide a way for the processor to rapidly writedata for later incorporation into a slower portion of memory as theprocessor executes other tasks.

Non-volatile memory 138 may include non-removable media, which mayinclude NAND/NOR flash memory and hard drives. Non-volatile memory 138may also include removable media, such as Compact-Flash (CF) cards,Secure-Digital (SD) cards, magnetic or optical discs and other removablemass storage devices.

Discs are typically organized into portions known as “clusters” that aredifferentiated by addresses. A cluster is a sequence of contiguoussectors or linked sectors representing portions of a disc, for example.A cluster is defined as a group of 1 or more sectors and is determinedduring formatting. When a file is written to the disc, it may be writtento one cluster or it may require several clusters. The several clusterscontaining data representing a file may be contiguous but often are not.As a result, it is necessary to have a master list of the clusters intowhich a given file is written and for the list to provide the order inwhich the clusters are organized. Such a list is referred to as a“chain” of clusters. A group of such lists form a portion of the TFAT.The TFAT thus is a tool for data retrieval that permits data to be readfrom the storage medium in an organized manner. Other types of storagemedia may be organized to mimic the organization of a disc in order tobe able to be accessed intelligibly by modules that are based on a discmodel.

Computer system 100 typically includes at least some form of computerreadable media. Computer readable media can be any available media thatcan be accessed by computer 100. By way of example, and not limitation,computer readable media may comprise computer storage media andcommunication media. Computer storage media includes volatile andnon-volatile, removable and non-removable media implemented in anymethod or technology for storage of information such as computerreadable instructions, data structures, program modules or other data.

Computer storage media includes, but is not limited to, RAM, ROM,EEPROM, flash memory or other memory technology, CD-ROM, digitalversatile disks (DVD) or other optical storage, magnetic cassettes,magnetic tape, magnetic disk storage or other magnetic storage devices,or any other media which can be used to store the desired informationand which can be accessed by computer system 100. Communication mediatypically embodies computer readable instructions, data structures,program logic or program modules or other data in a modulated datasignal such as a carrier wave or other transport mechanism and includesany information delivery media.

The term “modulated data signal” means a signal that has one or more ofits characteristics set or changed in such a manner as to encodeinformation in the signal. By way of example, and not limitation,communication media includes wired media such as wired network ordirect-wired connection, and wireless media such as acoustic, RF,infrared and other wireless media. Any of the above or combinations ofany of the above should also be included within the scope of computerreadable media.

The computer system 100 also includes one or more interfaces 140.Interfaces 140 may permit the computer system 100 to accept user input,for example via a keyboard, voice control, touch screen or othertactile, auditory, electrical or optical input device, and may permitinformation to be passed to a user via auditory or optical devices.Interfaces 140 may also couple the computer system 100 to an appliance(not illustrated), such as a global positioning system, or to a vehicle,or to other types of systems such as the Internet or othercommunications systems.

Interfaces 140 may also allow external computer systems (not shown) tointeract with the computer system 100. For example, data such asaccumulated distance traveled, service logs, malfunction logs applicableto associated subsystems, positional data describing historicalperformance data relevant to the computer system 100 and/or associatedequipment and the like may be accessible to external computers via aninterface 140. Similarly, modifications or upgrades to softwareassociated with the computer system 100 may be coupled to the computersystem 100 via an interface 140. Such could find utility in a vehicularapplication of the computer system 100, for example.

Alternatively, a removable portion of the non-volatile memory 138 may bedecoupled from the computer system 100, temporarily or permanently, andinterfaced with an external computer system (not shown), or vice versa.In either case, it is important to have some commonality of memorysystem organization to allow either the external computer system or theprocessor 110 to be able to read and/or write data to the memory system130 or a detachable component of the non-volatile memory 138.

FIG. 1B is a block diagram showing an exemplary operating system 150 andTFAT file system 170 suitable for use with the computer 100 of FIG. 1A.The operating system 150 provides an environment in which applicationsmay be employed by the computer 100. When the processor 110 encounters awrite command, a TFAT control module 160 is invoked to cause the TFATfile system 170 coordinate with the write command as data are beingwritten to the non-volatile memory 138.

FIG. 2 is a block diagram representing an exemplary transaction-safefile allocation table (TFAT) system 200 (analogous to the TFAT filesystem 170 of FIG. 1B) implemented together with a volume 210 of thenon-volatile memory 138 of FIG. 1A. The volume 210 includes a bootsector, BS/BPB 212, a first file allocation table FAT0 214, a secondfile allocation table FAT1 216 and a file and directory data region 218.

The following detailed description uses several terms of art.Definitions for some of these terms are given below.

STREAM. A stream is an abstraction of a file or directory and representsa continuous run of data, starting at offset 0, in one embodiment. Datacan be read and written to the stream arbitrarily, and in arbitrarysizes by a file system. The file system maps the stream to the actualphysical layout of the file on disk. An internal DSTREAM data structurestores information about the stream, and is often used in the filesystem code.

RUN. A run is a set of contiguous blocks of a file. Disk operationsoperate on contiguous data in a single operation. Accordingly, the runis an important part of all disk operations to a file or directory. TheRUN data structure contains information about a run; the RUN structurestored in the DSTREAM contains information about the current run used inthe last operation on the stream. The run usually contains informationsuch as the starting and ending stream-relative offsets, and thevolume-relative blocks corresponding to the offsets on disk.

Directory Entry (DIRENTRY). In one embodiment, DIRENTRY is a 32-bytestructure. DIRENTRY contains information about a file or directory, anddirectories are composed of DIRENTRY structures. The internal DIRENTRYstructure matches the format of the on-disk structure exactly.

BUFFER. A buffer is an internal data structure that is used to bufferdata that has been read from non-volatile memory such as a disk. The BUFstructure stores information pertinent to a buffer, such as its currentstatus, volume-relative block number, and a pointer to the actual data.Unless stream I/O is done in individual block-size chunks, it goesthrough the buffer subsystem.

SID or Unique Stream Identifier. This is an internal data structure thatrepresents a unique ID for internal stream structures. SIDs are usedthroughout the file system code as a means for identifying streams, andin file system notifications. The DSID structure contains the cluster ofthe directory which contains the stream's DIRENTRY, and the ordinal inthe directory of the stream's DIRENTRY. In conventional FAT volumes,this is guaranteed to be unique for each file (stream), and to neverchange.

Conventional FAT file systems assume that the starting cluster of adirectory will never change. As a result, such systems use the directorycluster numbers as part of Stream IDs (SID). In TFAT file systems,changes to the first cluster of a file/directory would also necessitaterewriting the directory entry, for reasons discussed in more detailbelow. If all directory entries were in the first clusters of theirparents' streams, then these changes propagate all the way to the root(because each modification requires a write to a new cluster, and if itis the first cluster of a file/directory, the directory entry needs tobe updated for that new cluster, and so on).

In many file systems, a conventional directory is merely a collection of32-byte DIRENTRYs, one after another, starting with two special systemdirectory entries that are typically represented as ‘.’ (“dot”) and ‘..’(“dot dot”). In a conventional FAT file system, these two systemdirectory entries are associated with each directory, subdirectory andfile stored on the storage device, except the root directory. Withrespect to each directory or subdirectory, the “dot” entry points to acurrent sector where the directory or subdirectory is stored. The “dotdot” entry points to a parent directory.

In one embodiment of TFAT, a modified directory structure prevents anychangeable data from being stored in the first cluster of a directorystream to prevent propagation of these first-cluster modifications. Themodified directory structure is implemented with a process 300,discussed below with reference to FIG. 3A.

FIG. 3A is a flowchart of an exemplary process 300 for creatingdirectories and subdirectories that finds application with the TFAT filesystem 200 of FIG. 2.

The process 300 begins in a block 305. Typically, the process 300 isinitiated as a part of a memory write.

In a block 310, the process 300 allocates a first region of thenon-volatile memory 138 of FIG. 1A. In one embodiment, such correspondsto allocating first and second clusters for a subdirectory.

The process 300 then enters data corresponding to a directory or asubdirectory into a first portion of the directory or subdirectory in ablock 320. In one embodiment, such corresponds to a parent directory andto a sector corresponding to the associated directory or subdirectory,i.e., entries analogous to the ‘.’ and ‘..’ entries discussed above.

In a block 330, the process 300 fills a remainder of the first clusterwith unchangeable data. In one embodiment, such unchangeable datacomprises volume labels. The process 300 then ends in a block 335.

In many conventional file systems, a single cluster is allocated foreach newly created directory. Note that the root directory is a specialcase, and does not have the ‘.’ or ‘..’ entries present.

In one embodiment of a TFAT volume, when a first cluster is allocatedfor a new directory or subdirectory, only two DIRENTRYs (‘.’ and ‘..’entries) are written when the new directory or subdirectory is created(block 310). The rest of the first cluster is filled (block 330) withdata that are typically not overwritten by conventional systemoperations, i.e., data that are unchangeable. Examples of such datainclude volume labels.

In this embodiment, a second cluster is also allocated by TFAT (block310) when the first cluster is allocated and written because the firstcluster is already going to be filled (block 330). This embodimentrequires a fixed overhead of an additional cluster for each directory.However, the performance gains obtained by not having propagatingchanges often outweigh the extra space required for each stored datafile or subdirectory. In this embodiment, rewriting a directory entrydoes not cause changes to propagate up or down the directory hierarchyand instead requires relinking the FAT chain for the directory.

Because the first cluster is filled with unchanging data such as volumelabels instead of other data that may be changeable, file systems suchas those for desktop computers never access the data stored in theportion of the first cluster after the ‘.’ and ‘..’ files oraccidentally delete those data. However, such directories cannot bedeleted by such types of computers and file systems running on operatingsystems such as the family of Windows® operating systems produced byMicrosoft Corporation for application in desktop computers.

Files added to this directory by desktop-type computers usingconventional file systems will also not occupy the first cluster becausethe first directory cluster is filled with unchangeable data such asvolume labels. When a conventional directory is created by suchcomputers, the first cluster will not be filled with data such as volumelabels. As a result, file write operations performed by such computerson such directories are not transaction-safe.

For FAT12 and FAT16 file systems, the root directory is in a fixedlocation on the storage medium and has a fixed size. In such systems,the first root directory cluster cannot be filled with data such asvolume labels. In FAT32 file systems, the root directory need not have afixed location or size, but none of these FAT file systems provide aroot directory that is transaction-safe, i.e., one which can be moved orremapped without risk of corruption.

In one embodiment, TFAT employs a first root directory in theconventional location that includes a pointer to a first subdirectory(block 310), which then effectively becomes the working “root”directory. When portions of the first root directory other than thepointer are filled (block 330) with unchangeable data, the data in thefirst root directory never changes. As a result, the first rootdirectory cannot be corrupted by interruption of a write cycle and thusis transaction-safe. When the first subdirectory is also configured suchthat the first cluster contains “.” and “..” entries followed byunchangeable data, it also is transaction-safe. Additionally, thisembodiment is backwards compatible with conventional FAT file systems.

However, the FindNext feature in many systems still searches through thedata contained in sectors that include the “.” and “..” entries. Thus,the FindNext feature may be improved in TFAT by not searching throughthe first cluster of each TFAT directory.

FIG. 3B is a flowchart of an exemplary process 350 for parsingdirectories and subdirectories that finds application with the TFAT filesystem 200 of FIG. 2. The process 350 begins in a block 355.

In a block 360, the process 350 receives a request that requiressearching of the memory. For example, the request may be a request tolocate a file, such as a FindFirstFile request or a FindNextFilerequest. Servicing the request involves reading data from successivesectors of the memory in order to compare them to criteria associatedwith an objective of the search.

In a block 365, the process 350 selects a next cluster of the memorystructure to parse in servicing the request. The next cluster is onethat would, in conventional file systems, be read and then compared tocriteria related to the search request in the course of the search.

In a query task 370, the process 350 determines when the selectedcluster is a first cluster in a directory or subdirectory. When theprocess 350 determines that the selected cluster is a first cluster in adirectory or subdirectory, control passes to a block 375. When theprocess 350 determines that the selected cluster is not a first clusterin a directory or subdirectory, control passes to a block 380. In theblock 375, the entire selected first cluster of that directory isskipped. Control then passes back to the block 365.

In the block 380, the selected cluster is examined Control then passesto a query task 390.

In the query task 390, the process 350 determines when the currentlyselected cluster is the last cluster to be searched. When the query task390 determines that the selected cluster is the last cluster to besearched, control passes to a block 395. Determination that thecurrently selected cluster is that last cluster to be searched couldresult from location of the desired file or files, or from adetermination that all applicable clusters have been searched. When thequery task 390 determines that the selected cluster is not the lastcluster to be searched, control passes back to the block 365.

In the block 395, the process 350 returns either a message indicatingthat the search criteria could not be satisfied or that the searchcriteria were satisfied. The process 350 then ends.

Every directory in TFAT contains a dummy cluster as the first cluster ofthe directory, so that the first cluster does not have to be transacted.By not transacting the first cluster, the complexity of transacting adirectory can be reduced. Thus, an improvement that prevents the firstcluster of the directory from being read when tasks are being executedthat involve reading data from a cluster and then comparing the readdata to some set of criteria, followed by iteration of these tasks on asubsequent cluster until the task criteria are satisfied, is desirable.Examples of such tasks include searching for files and subdirectories,such as to the FindFirstFile/FindNextFile features. This reduces thetime required to execute these features, because this first clustercontains no information having utility for these purposes.

In one embodiment of TFAT, at least two file allocation tables(corresponding to FAT0 214 and FAT1 216 of FIG. 2) are maintained, withone being active and the other being non-active at any one time. When achange occurs to data stored on a mass non-volatile data storage device(e.g., NVM 138 of FIG. 1A) such as a magnetic disk, that change isrecorded in the non-active FAT table. In one embodiment, one bit in amaster boot record (MBR) controls which FAT table is active.

When the entire write is complete and the non-active FAT table iscompletely updated to reflect the completed write, the active FAT bit inthe MBR is flipped and the previously non-active FAT becomes the activeFAT. This newly active TFAT is then copied over the new non-active TFAT.TFAT will only guarantee that the file system will stay intact duringpower loss. When a write and TFAT update operation is not yet completeand an interruption occurs, data involved in that write operation may belost and it is up to the application or user to address the data loss.

In one embodiment, the system maintains two FATs. A default TFATwrite/file modification proceeds as follows. Initially the FATs are setup with FAT0 as a primary file allocation table and FAT1 as a secondaryfile allocation table. A write to a volume on a storage medium proceedsas described below with reference to process 400 as shown in theflowchart of FIG. 4.

FIG. 4 is a flowchart of an exemplary process 400 for writing data tothe non-volatile memory 138 of FIG. 1A that includes the TFAT filesystem 200 of FIG. 2. In one embodiment, one or more computer readablemedia (e.g., 138, FIG. 1A) have stored thereon a plurality ofinstructions that, when executed by one or more processors (e.g., 110,FIG. 1A), causes the one or more processors to modify data representedby at least a first sector on the non-volatile storage medium such thatthe one or more processors perform acts to effect the process 400. Theprocess 400 begins in a block 405.

In block 410, an application initiates a write operation to write datato the volume.

In block 420, the write triggers the memory manager 130 of FIG. 1A towrite a new sector of the medium via block drivers. In one embodiment,the application writes a new sector of the storage medium via an atomicblock write. In one embodiment, the memory manager 130 of FIG. 1A writesthe new sector in response to an instruction to close the file. Writingdata to modify a file to a new sector preserves all old data because thesector containing the old data is not overwritten by the new data.

In block 430, cluster chains are updated.

In block 440, used/unused sector information are written in FAT1 (216,FIG. 2). In one embodiment, the processor 100 enters file allocationdata including data describing the new sector in a first file allocationtable.

In block 450, a variable is set to a first value. In one embodiment, thevariable is set to a first value configured to block access to thestorage medium by first types of file systems and configured to permitaccess to the storage medium by second types of file systems, such asthe TFAT described in this disclosure. In one embodiment, the firsttypes of file systems may include FAT12, FAT16 or FAT32. In oneembodiment, the first value disables conventional file systems fromaccessing the storage medium. In one embodiment, the variablecorresponds to a number of FATs (NOF) field located in the boot sectorof the volume.

In block 460, the FAT1 is copied to the FAT0 (214, FIG. 2). Thissynchronizes FAT1 and FAT0.

In block 470, the variable is reset to a second value. The second valueindicates to a TFAT file system that the FAT0 is a last known good FAT.In one embodiment, the second value also enables conventional filesystems to access the storage medium. In one embodiment, resetting thevariable to a second value permits access to the storage medium by thefirst and second types of file systems.

In block 480, the clusters corresponding to the previous version of thenewly-written data are “unfrozen”, that is, are marked as unallocatedchains. The previous version of the file is thus recoverable until suchtime as the new data have been written, the FAT1 has been updated andFAT1 and FAT0 have been synchronized.

The process 400 then ends in a block 485.

In one embodiment, the variable of block 450 represents a number of FATs(NOF) field. In one embodiment, the first value for the variable is zeroand the second value for the variable is two.

In one embodiment, the first two cluster entries of the second FAT tableare reserved. All the bits in the second cluster entry are, by default,set to 1. When one of the highest two bits of the second cluster entryis set to 0, conventional desktop computers are likely to be triggeredto perform a scandisk utility operation when the operating system isbooted. However, it does not trigger any activity when the storagedevice is inserted and mounted.

This embodiment works well for hard-drive type media because a powerfailure in hard drive during a write operation can corrupt a sectorbeing written. Because there are two FAT tables, the other FAT table isstill available when one of the FAT tables is corrupted, assuming thatthe block driver will return a read error if the sector is corruptedduring a write operation.

At end of each transaction, FAT1 is copied to FAT0 by a processdescribed below with reference to an exemplary process 500 as shown inthe flowchart of FIG. 5.

FIG. 5 is a flowchart of an exemplary process 500 for synchronizing TFATvolumes in the TFAT file system 200 of FIG. 2. The process 500 may beimplemented by the processor 110 of FIG. 1A, for example, and begins ina block 505.

In block 510, the second cluster entry in FAT0 is set to a first value.In one embodiment, the first value is zero.

In block 520, FAT1 is copied to FAT0, resetting the second cluster entryto a second value. The first sector is copied last. In one embodiment,the second cluster entry is set to all ones. The process 500 then endsin a block 525.

FIG. 6 is a flowchart of an exemplary process 600 for identification ofTFAT volumes and to determine which FAT is the last known good FAT whena volume of non-volatile memory 138 is mounted in a system such as thecomputer system 100 of FIG. 1A. The process 600 may be implemented bythe processor 110 of FIG. 1A, for example. The process 600 begins in ablock 605.

In a query task 610, the process 600 determines a value for a number ofFATs. When query task 610 determines that NOF is set to 2, the process600 treats that volume as a non-TFAT volume. In block 620, the process600 selects FAT0 as the last known good FAT and treats the volume as anon-FAT volume. The process 600 then ends in a block 625. When querytask 610 determines that NOF is not 2, control passes to query task 630.

When query task 630 determines that the second cluster entry of FAT0 isnot 0, the process 600 treats that volume as a TFAT volume. In block640, FAT0 is copied to FAT1. The process 600 then ends. When query task630 determines that the second cluster entry of FAT0 is 0 or determinesthat the sector-read on the first sector of the FAT0 failed, controlpasses to block 650.

In block 650, FAT1 is copied to FAT0. The process 600 then ends in ablock 655.

In one embodiment, TFAT includes a registry setting to allow selectionbetween setting NOF to first and second values or second cluster entryvalues in FAT0 to identify TFAT media and to determine which FAT toemploy.

In one embodiment, this registry setting is bit 0x40000 in the “Flags”value of the conventional FAT registry key (“0x” signifies that thenumber is hexadecimal, i.e., base 16). When this bit is set, TFAT usesthe second cluster entry in FAT0 for last known good FAT determination.

In one embodiment, access to the storage medium via conventional filesystems is blocked by setting a bit on the storage medium to a valuethat corresponds to an indication of a defective storage medium.

In one embodiment, the TFAT control module 160 of FIG. 1B causes theFATs, and possibly also the directory file, to be re-written for everyfile system write. A series of small file system writes compromisessystem performance because each write to the storage medium istransacted and the TFAT is updated for each of these write operations.

FIG. 7A is a flowchart of a process 700 for determining when to writedata to non-volatile storage media (e.g., 138, FIG. 1A) using the TFATfile system 200 of FIG. 2. The process 700 may be implemented by theprocessor 110 of FIG. 1A, for example, and begins in a block 705.

The process 700 then, in a block 710, accumulates (e.g., in RAM 136)data to be written from a plurality of instructions to write data to thestorage medium 138. In a block 720, a cumulative record of an amount ofdata to be written is maintained.

A query task 730 tests for presence of a first predetermined thresholdcondition. In one embodiment, the threshold condition is met at the timewhen the file is closed. In one embodiment, the threshold condition ismet when a predetermined or adjustable amount of data to be written hasbeen accumulated. When the amount of accumulated data is less than thepredetermined threshold, control passes back to block 710 to awaitfurther write data commands.

When the query task 730 determines that the predetermined thresholdcondition has been met, the process 700 causes the processor 110 of FIG.1A to write the accumulated data to the storage medium 138 (block 740).The process 700 then ends in a block 745.

In one embodiment, the “Delayed Commit” feature allows flexibility tochoose whether to commit FAT tables at the time the file is closed ornot. In one embodiment, the TFAT control module 160 of FIG. 1B causesthe application to merge several small writes into one single one.

However, because a write can fail if there is not enough free storagespace in the storage medium, storing very large data blocks (>500 KB) inone single write can result in failure. In order to avoid such writefailure, the TFAT control module 160 finds enough free sectors in thevolume of storage medium to be able to write a new sector for eachsector of data to be written or modified.

Accordingly, in one embodiment, the TFAT control module 160 determinesamounts of data to be written in response to individual write commandsand accumulates these data until a predetermined threshold quantity ofdata to be written is achieved. In one embodiment, the threshold may beadjustable as a function of the amount of available storage on thestorage medium as that amount fluctuates. In other words, when theamount of unallocated storage medium is small, the threshold may bereduced or smaller, while when the amount of unallocated storage mediumis relatively large (at least compared to the amount of storage mediumrequired for each write), the threshold may be increased or larger.

In one embodiment, an intermediate TFAT is created in volatile memory tokeep track of the non-volatile memory write operations to be carriedout, either at when the file is closed or when the predeterminedthreshold is achieved. The intermediate TFAT is maintained at leastuntil the FAT1 is updated.

In one embodiment, when a single block of data needs to be modified, theTFAT file system first reads the existing disk block into a systembuffer. The TFAT file system then finds and allocates a new cluster ondisk. The TFAT is then traversed to find any entries corresponding tothe old cluster, and the new cluster is relinked to replace such. Thiscompletes the FAT chain modifications. Then the system buffer is“renamed” to correspond to the newly allocated cluster on disk. In oneembodiment, it is also marked as “dirty,” which causes the system bufferto be written out to disk when the buffer is ever released (avoidinghaving to perform an immediate and duplicate write; the TFAT controlmodule 160 can modify the buffer, and then write it all out tonon-volatile storage at once).

In another aspect, the TFAT system 200 of FIG. 2 can work with awrite-back cache memory by flushing the cache memory when the datastored in the cache memory are written or committed to non-volatilememory. In other words, a flushing mechanism has been added to allow theTFAT file system 200 of FIG. 2 to flush a cache memory during the committransaction process. For example, a write-back cache marks all sectorwrites that come to the cache memory as dirty, and then writes thosedirty sectors to non-volatile memory, such as a flash memory or disk, ata later time, following which the portions of the cache memory are“flushed” or freed for other uses. In order for TFAT to operateeffectively, a control mechanism needs to regulate when these dirtysectors in the cache memory are flushed. A flush request for flushingthe cache memory is passed to the block driver, to ensure that anybuffering that the block driver device is doing has been flushed. Thecommit transactions process 750 works as described below with respect toFIG. 7B.

FIG. 7B is a flowchart describing the process 750 for cache memoryoperation in a TFAT system 200. The process 750 begins in a block 755.

In a block 760, the process 750 calls a flush-cache instruction thatacts on both the cache memory (e.g., the cache memory 112 of FIG. 1) andthe TFAT files. The cache memory contains data in sectors that have beenmarked as being “dirty”.

In a block 765, the process 750 writes the “dirty” sectors tonon-volatile memory.

In a block 770, the cache memory calls a block driver device to flushthese “dirty” sectors. This to ensures that any buffering that the blockdriver device is doing has also been flushed.

In a block 775, the FATs are synchronized to reflect the data that havebeen written to the non-volatile memory. The process 750 then ends.

In one embodiment, a WriteFileGather function allows multiplediscontinuous pages to be written to discontinuous locations of a file.In TFAT, this can be done in an atomic fashion if the file is open inwrite-though mode. This may be used to flush multiple pages of data tovarious locations in a file all in one atomic operation.

In one embodiment, the approach taken is slightly different. WriteFilecan write an arbitrary amount of data to an arbitrary location in afile. In one embodiment, a stream process is used to clone streams.

When there is an attempt to write to an existing part of a stream, thisembodiment attempts to allocate enough space for the entire write, oruses the most contiguous space available. Since stream-based operationsoperate on “runs” (e.g., contiguous blocks of data on storage media suchas disks), cloning is performed in the same fashion. An unallocated runof appropriate length is located, and this is termed a “parallel run”.For example, if data in a run corresponded to sectors 51-55, a parallelrun might be 72-76.

After a parallel run has been allocated, it is linked in to the existingFAT chain for the file, and the stream's current run information isupdated with this new information. The rest of the function callproceeds conventionally, except instead of writing to the old run of thefile, data are written to the new, parallel run, and the original copyof the run is preserved on the storage medium. This only occurs for datacomposed of block-sized chunks of data that are block-aligned.

Thus, before any data is written to storage media such as disks, theportion to be written to is reallocated, and the structures updated, sothe writes occur to new clusters. When a stream needs to be expanded(i.e., the write is occurring past EOF, the end of the file), then thesenew clusters are not cloned; there is no backup to preserve.

The strategy outlined by example with respect to processes 300-700maintains a backup of the most recent “good” version of the FAT in casepower is lost during sector writing or FAT updating. In one embodiment,when a power-on reset occurs, NOF=0 means that TFAT file systems willuse FAT1 as the valid or last known good FAT; while NOF=2 means thatTFAT file systems will use FAT0 as the valid FAT and similarly thatdesktop-compatible file systems should be able to use FAT0.

This embodiment allows compatibility with existing desktop systems (thatdo not comprehend TFAT) when a transaction has been completed and theNOF flag=0. It also prevents such a conventional desktop system fromreading the volume when power has been lost in mid-transaction, i.e.,after the NOF field was set to two but prior to updating of FAT0 and/orresetting of the NOF field.

TFAT can be incorporated in and operate on all sorts of physical storagemedia. Non-removable media include NAND/NOR flash memory and harddrives. Removable media include Compact-Flash (CF) cards, Secure-Digital(SD) cards, floppy discs and other removable mass storage devices.

In one embodiment, a block driver module associated with the physicalmass storage device employs atomic block write operations. In oneembodiment, block size equals sector size, e.g., 512 bytes. In oneembodiment, TFAT supports any block driver that does atomic sector-size(512 bytes) disk I/O operation.

As used herein, “atomic” means that if a failure happens (due to powercycle or media removal) during a sector-sized write-operation, aread-operation on the same sector at a later time can only have thefollowing three results:

1. Read returns old sector data.

2. Read returns new sector data.

3. Read returns failure.

For some types of NAND-flash media, only the first two results arepossible. For hard-drive type media, all three results are possible. Forgeneral media and other types of block write module, at least one otherpossible outcome is that the read operation returns corrupted data. TFATsupports at least those systems and media where atomic block writeoperations are employed. In one embodiment, TFAT supports mediaemploying transacted block modules.

Because the TFAT file system writes an entire new sector or file whendata are modified in any file, TFAT may be slower than conventional FATfile systems. A system employing TFAT may be, for example, 2.5 to 1.05times slower than a conventional FAT file system. In one embodiment,this ratio can be lowered by committing the write to the TFAT controlmodule 160 when the file is closed instead of with every write to thefile.

FIGS. 8A and 8B are block diagrams showing relationships between sectorsforming a FAT chain for a given file, before and after a writeoperation. FIG. 8A illustrates a portion 800 of a hypothetical FAT chainfor the file prior to a write operation. The portion 800 includes sector22 (block 810), sector 55 (block 820), sector 500 (block 830), sector300 (block 840) and sector 15 (block 850). FIG. 8B illustrates a portion860 of the hypothetical FAT chain after the write operation, whichupdates the data contained in blocks 820, 820 and 840, but which doesnot modify the data contained in those blocks. Instead, sector 77 (block870), sector 332 (block 880) and sector 11 (block 890) are allocated andwritten, and the FAT chain is updated to reflect the new file structure.In the event that the write process is interrupted by a power failure orother system disturbance, the data contained in the file prior to thewrite (blocks 810-850) are uncorrupted and thus are recoverable.

Making the file size transaction safe is also a desirable feature, thatis, adding the ability to make any change in file size by extending orshrinking the size of a file in a transaction-safe manner. FIG. 9 is aflowchart of a process 900 for making file size transaction safe. Theprocess 900 begins in a block 905 with a write event.

In a query task 910, the process 900 determines when it is desired tomake the file size transaction safe. In one embodiment, this isdetermined when the write is to a volume in which the file size is madetransaction safe.

When the query task 910 determines that the file size need not betransaction safe, control passes to a block 915. In the block 915, thefile write is transacted, but the file size is not made transaction safe(e.g., as described above), and the process 900 then ends.

When the query task 910 determines that the file size is to betransaction safe, control passes to a block 920.

In the block 920, the file contents (data) are written to the volume.The file contents may be written to the volume by transacting an atomicmemory write to non-volatile memory media. Control then passes to ablock 925. In the block 925, the FAT is updated, and the file size istransacted. In other words, the size of a file is stored in thedirectory entry for that file, so the directory entry has to betransacted. The process 900 then ends in a block 930.

As a result, when power is lost during the middle of extending orshrinking a file using the process 900, the file size recorded in theFAT is either the original size or the new size, and is not some sizethat is in between these two sizes. This also tends to impactperformance, because additional data need to be handled via the TFATsystem. As a result, making file size transaction safe is done when theTFAT volume is specifically configured to make file size datatransaction safe.

FIG. 10 is a flowchart of an exemplary process 1000 for writing a newcluster in the TFAT file system 200 of FIG. 2, whereby a conventionalTFAT copy cluster is not done when new data will replace/overwrite theexisting data. The process 1000 begins in a block 1005.

In a block 1010, the process 1000 a new cluster is allocated. Controlthen passes to a query task 1015.

In the query task 1015, the process 1000 determines when the new clusterrepresents a revision of an existing cluster. When the query task 1010determines that the new cluster does not represent an over-write of anentire existing cluster, control passes to a block 1020. When the querytask 1010 determines that the new cluster does represent an over-writeof an existing cluster, control passes to a block 1025.

In the block 1020, the old cluster of data is copied to the cluster thatwas allocated in the block 1010. This corresponds to conventional TFATcluster writing. Control then passes to the block 1025.

In the block 1025, the new cluster data are written to the clusterallocated in the block 1010. Control then passes to a query task 1030.

In the query task 1030, the process 1000 determines when further dataare to be written. When the query task 1030 determines that additionalclusters of data are to be written, control passes to a block 1035.

In the block 1035, a next cluster of data to be written is selected.Control then passes back to the block 1010, and the process 1000iterates. When the query task 1030 determines that no additionalclusters of data are to be written, the process 1000 ends in a block1040.

The TFAT discussed herein has been described in part in the generalcontext of computer-executable instructions, such as program modules,executed by one or more computers or other devices. Generally, programmodules include routines, programs, objects, components, datastructures, etc. that perform particular tasks or implement particularabstract data types. Typically the functionality of the program modulesmay be combined or distributed as desired in various embodiments.

For purposes of illustration, programs and other executable programcomponents such as the file system are illustrated herein as discreteblocks, although it is recognized that such programs and componentsreside at various times in different storage components of the computer,and are executed by the data processor(s) of the computer.

Alternatively, TFAT may be implemented in hardware or a combination ofhardware, software, and/or firmware. For example, one or moreapplication specific integrated circuits (ASICs) could be designed orprogrammed to carry out aspects of the TFAT file system.

Although TFAT has been described in language specific to structuralfeatures and/or methodological steps, it is to be understood that therecitation in the appended claims is not necessarily limited to thespecific features or steps described. Rather, the specific features andsteps are disclosed as preferred forms of implementing the claimedsubject matter.

What is claimed:
 1. A method comprising: maintaining a first fileallocation table and a second file allocation table for a file system;synchronizing the first file allocation table and the second fileallocation table resulting in the second file allocation tablecomprising a copy of the first file allocation table, the first fileallocation table and the second file allocation table containing sectorinformation for used sectors of the file system; setting a last knowngood indicator to indicate that the first file allocation table is alast known good file allocation table; initiating a write operation towrite data to a file in the file system; and, thereafter: writing thedata to one or more unused sectors in the file system; subsequent towriting the data to the one or more unused sectors in the file system,rendering an update to the second file allocation table; and maintainingthe first file allocation table as the last known good file allocationtable when the write operation is interrupted, wherein the last knowngood indicator is not reset to indicate that the second file allocationtable is maintained as the last known good file allocation table.
 2. Themethod of claim 1 wherein: the write operation comprises overwriting olddata of the file, and used sectors containing the old data of the fileare preserved.
 3. The method of claim 1 further comprising storing datato be written to the file in cache memory prior to writing the data tothe one or more unused sectors in the file system.